Conversation
…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
leszko
left a comment
There was a problem hiding this comment.
Added some minor comments, I'll review it more in detail soon. But the general code structure looks fine.
The biggest thing is that I'm not sure the synchronization work fine. @BuffMcBigHuge I tested this on your LTX-2 and I found the video completely not synchronized with audio, so the experience is not good. The video pauses then the audio is still playing, etc.. Look at the recording I share. This is something we need to address.
demo_audio.mp4
| logger.error(f"Error sending NDI frame: {e}") | ||
| return False | ||
|
|
||
| def send_audio( |
There was a problem hiding this comment.
Nice that you added the audio for NDI as well 🏅
| if not self.pipeline_processors: | ||
| time.sleep(0.01) | ||
| continue |
There was a problem hiding this comment.
Can we maybe start the audio loop AFTER the pipeline processors are created, then we wouldn't need to have this check here.
There was a problem hiding this comment.
I had struggled with this actually - I'll look into your suggestion.
| if isinstance(audio_tensor, torch.Tensor): | ||
| audio_np = audio_tensor.float().numpy() | ||
| else: | ||
| audio_np = np.asarray(audio_tensor, dtype=np.float32) |
There was a problem hiding this comment.
Do we accept the pipelines to create audio in two formats? Or if not, then why would need this check?
| # Ensure shape is [C, S] (channels, samples) | ||
| if audio_np.ndim == 1: | ||
| audio_np = audio_np[np.newaxis, :] # mono -> [1, S] | ||
|
|
||
| # Mix down to mono for WebRTC (average channels) | ||
| if audio_np.shape[0] > 1: | ||
| audio_mono = audio_np.mean(axis=0) | ||
| else: | ||
| audio_mono = audio_np[0] |
There was a problem hiding this comment.
Similar question, we have a lot of checks for the audio tensor shape, do we allow returning different formats?
|
|
||
| return frame | ||
|
|
||
| def _audio_drain_loop(self): |
There was a problem hiding this comment.
Nit: I wonder if we could/should move the audio related code into a separate file, like audio.py. The main reason is that frame_processor.py becomes busy.
| if audio_output is not None and audio_sample_rate is not None: | ||
| # Detach and move to CPU for downstream consumption | ||
| audio_output = audio_output.detach().cpu() | ||
| logger.info( |
There was a problem hiding this comment.
I think it should be at the debug log, because the logs get too noisy.
| self._audio_buffer.append(audio_mono) | ||
| self._audio_buffer_samples += len(audio_mono) | ||
| logger.info( | ||
| f"[FRAME-PROCESSOR] Audio buffered: {len(audio_mono)} samples " |
There was a problem hiding this comment.
I think this should be a the the debug level, because the logs get too noisy.
j0sh
left a comment
There was a problem hiding this comment.
One note on timing / synchronization. Do the audio and video pipelines run independently?
As far as I can tell, media is timestamped right before sending out to WebRTC, in which case there will likely be desync if the pipeline for one track is delayed compared to another. You might want to propagate a reference timestamp at the beginning of both pipelines earlier in the process, but I don't really know this codebase well enough to suggest where.
Re: sync, there might also be more subtle WebRTC usage issues but I am not sure yet; things look mostly fine from what I can tell. WebRTC playback sync is complex and some of the knobs in frameworks like Pion are a little unintuitive; I'm not too familiar with aiortc at the moment.
src/scope/server/tracks.py
Outdated
| media_time = self.media_clock.get_media_time() | ||
| frame.pts = self.media_clock.media_time_to_audio_pts(media_time) |
There was a problem hiding this comment.
I think the corresponding call for video is missing?
nit: since media_time_to_audio_pts is called immediately is after get_media_time, I think this API can be simplified to something like
frame.pts = self.media_clock.to_pts(AUDIO_CLOCK_RATE)
Since AUDIO_CLOCK_RATE is being manually set elsewhere anyway, I am not sure if there really needs to be separate API entry points for audio / video
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Audio Support for Scope
Summary
Adds end-to-end audio support to Scope's WebRTC streaming pipeline. Pipelines can now return audio alongside video in their output dict; the server buffers, resamples, and streams audio over WebRTC and NDI. A shared media clock keeps audio and video synchronized.
What's New
Backend
{"video": ..., "audio": ..., "audio_sample_rate": ...}. Audio keys are optional; pipelines that don't produce audio are unchanged.audio_output_queuefor audio chunks from the pipeline output dict.get_audio()drains the buffer for the audio track.get_media_time()so RTCP Sender Reports map correctly to NTP.MediaStreamTrackthat produces 20 ms frames at 48 kHz. Returns silence when no audio is available to keep the track alive.NDIOutputSink.send_audio()sends float32 audio viaNDIlib_send_send_audio_v2.Frontend
MediaStream. Adds a recvonly audio transceiver so the SDP offer includes an audio m-line for the backend to attach its track.WebRTC Handshake
The browser adds
addTransceiver("audio", { direction: "recvonly" })so the offer includes an audio m-line. AftersetRemoteDescription, the backend finds the audio transceiver, attaches itsAudioProcessingTrack, and sets direction tosendonly. The answer then indicates that the server will send audio.Architecture
Related